Chromatin Immunoprecipitation Sequencing    ◾    243

6.3.9  Motif Discovery

The major goal of ChIP-Seq is the determination of the binding sites, where TFs, Poly II, and

histone marks interact with the genomic DNA to control the transcription of genes. Those

sites have sequence patterns that are recognized by the targeted proteins. The genomic

sequence pattern that has such biological activity is called a motif. The motifs are usually

found in the genes’ regulatory regions. Therefore, they are most likely to be found in the

peak enrichment regions. The motif enrichment analysis is used to detect enrichment of

known binding motifs in the regulatory regions of genes. The researchers use motif analy-

sis to detect the binding site patterns of known library of TFs that are believed to regulate

a specific set of genes. Motifs are searched around the ChIP-Seq peaks of a specified win-

dow size. Remember that we have peak enrichment stored in “*peaks.narrowPeak” files.

However, the motif detection programs require FASTA sequence as input. Therefore, we

need to generate FASTA sequences from the BED file. We can create BED files by extract-

ing the first three columns from “*peaks.narrowPeak” files as follows:

mkdir motifs

cut -f 1,2,3 \

macs3output/chip1_peaks.narrowPeak \

> motifs/chip1_peaks.bed

cut -f 1,2,3 \

macs3output/chip2_peaks.narrowPeak \

> motifs/chip2_peaks.bed

cut -f 1,2,3 \

macs3output/chip3_peaks.narrowPeak \

> motifs/chip3_peaks.bed

The above commands create a new directory, “motifs”, and store the new created BED files

in it. We will extract FASTA sequences from each of these three files using bedtools, which

is a collection of programs for manipulation of BED files. On Ubuntu, you can install bed-

tools using “apt-get install bedtools”.

Visit the program website “https://bedtools.readthedocs.io/en/latest/content/installa-

tion.html” for more information.

The “bedtools getfasta” command is used to extract a FASTA file from each BED file.

This command requires the FASTA file of the reference sequence and a bed file as input.

We will use the same reference sequence that we used to generate BAM files.

bedtools getfasta \

-fi ref/hg19.fa \

-bed motifs/chip1_peaks.bed \

-fo motifs/chip1_peaks.fasta

bedtools getfasta \

-fi ref/hg19.fa \

-bed motifs/chip2_peaks.bed \

-fo motifs/chip2_peaks.fasta